Downsampling and feature extraction are essential procedures for 3D point cloud understanding. Existing methods are limited by the inconsistent point densities of different parts in the point cloud. In this work, we analyze the limitation of the downsampling stage and propose the pre-abstraction group-wise window-normalization module. In particular, the window-normalization method is leveraged to unify the point densities in different parts. Furthermore, the group-wise strategy is proposed to obtain multi-type features, including texture and spatial information. We also propose the pre-abstraction module to balance local and global features. Extensive experiments show that our module performs better on several tasks. In segmentation tasks on S3DIS (Area 5), the proposed module performs better on small object recognition, and the results have more precise boundaries than others. The recognition of the sofa and the column is improved from 69.2% to 84.4% and from 42.7% to 48.7%, respectively. The benchmarks are improved from 71.7%/77.6%/91.9% (mIoU/mAcc/OA) to 72.2%/78.2%/91.4%. The accuracies of 6-fold cross-validation on S3DIS are 77.6%/85.8%/91.7%. It outperforms the best model PointNeXt-XL (74.9%/83.0%/90.3%) by 2.7% on mIoU and achieves state-of-the-art performance. The code and models are available at https://github.com/DBDXSS/Window-Normalization.git.
translated by 谷歌翻译
We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon a vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, a DETR variant DINO~\cite{zhang2022dino}, and an efficient DETR training method Group DETR~\cite{chen2022group}. The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO. Group DETR v2 achieves $\textbf{64.5}$ mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco
translated by 谷歌翻译
Deep learning (DL) methods have been widely applied to anomaly-based network intrusion detection system (NIDS) to detect malicious traffic. To expand the usage scenarios of DL-based methods, the federated learning (FL) framework allows multiple users to train a global model on the basis of respecting individual data privacy. However, it has not yet been systematically evaluated how robust FL-based NIDSs are against existing privacy attacks under existing defenses. To address this issue, we propose two privacy evaluation metrics designed for FL-based NIDSs, including (1) privacy score that evaluates the similarity between the original and recovered traffic features using reconstruction attacks, and (2) evasion rate against NIDSs using Generative Adversarial Network-based adversarial attack with the reconstructed benign traffic. We conduct experiments to show that existing defenses provide little protection that the corresponding adversarial traffic can even evade the SOTA NIDS Kitsune. To defend against such attacks and build a more robust FL-based NIDS, we further propose FedDef, a novel optimization-based input perturbation defense strategy with theoretical guarantee. It achieves both high utility by minimizing the gradient distance and strong privacy protection by maximizing the input distance. We experimentally evaluate four existing defenses on four datasets and show that our defense outperforms all the baselines in terms of privacy protection with up to 7 times higher privacy score, while maintaining model accuracy loss within 3% under optimal parameter combination.
translated by 谷歌翻译
大脑计算机界面(BCI)提供了人脑和外部设备之间的直接通信途径。在新受试者可以使用BCI之前,通常需要进行校准程序。因为间和受试者内部的差异是如此之大,以至于由现有受试者训练的模型在新受试者方面的表现不佳。因此,有效的主题转移和校准方法至关重要。在本文中,我们提出了一种半监督的元学习(SSML)方法,用于BCIS的主题转移学习。拟议的SSML首先学习了具有现有受试者的元模型,然后以半监督的学习方式对模型进行微调,即使用很少的标记和许多未标记的目标对象样本进行校准。对于标记数据稀缺或昂贵的同时,无标记数据的BCI应用程序非常重要。为了验证SSML方法,测试了三种不同的BCI范例:1)与事件相关的潜在检测; 2)情绪识别; 3)睡眠舞台。 SSML在前两个范式上取得了显着提高15%,而第三个范式则达到4.9%。实验结果证明了SSML方法在BCI应用中的有效性和潜力。
translated by 谷歌翻译
我们介绍了自回归文本到图像(Parti)模型的途径,该模型生成高保真的影像图像并支持涉及复杂组成和世界知识的内容丰富的合成。 Parti将文本对图像生成视为类似于机器翻译的序列到序列建模问题,图像令牌的序列是目标输出,而不是其他语言的文本令牌。这种策略自然可以利用大型语言模型的先前工作,通过扩展数据和模型尺寸,能力和性能的持续进展。我们的方法很简单:首先,Parti使用基于变压器的图像令牌VIT-VQGAN将图像编码为离散令牌的序列。其次,我们通过将编码器二次变压器模型缩放到20B参数来实现一致的质量改进,其新的最新零弹药FID得分为7.23,而MS-Coco的FIDED得分为3.22。我们对本地化叙述以及党的详细分析(P2),这是1600多个英语提示的新的整体基准,证明了Parti在各种类别和难度方面的有效性。我们还探索并突出了我们的模型的局限性,以定义和体现关注重点领域以进一步改进。有关高分辨率图像,请参见https://parti.research.google/。
translated by 谷歌翻译
协作多代理增强学习(MARL)已在许多实际应用中广泛使用,在许多实际应用中,每个代理商都根据自己的观察做出决定。大多数主流方法在对分散的局部实用程序函数进行建模时,将每个局部观察结果视为完整的。但是,他们忽略了这样一个事实,即可以将局部观察信息进一步分为几个实体,只有一部分实体有助于建模推理。此外,不同实体的重要性可能会随着时间而变化。为了提高分散政策的性能,使用注意机制用于捕获本地信息的特征。然而,现有的注意模型依赖于密集的完全连接的图,并且无法更好地感知重要状态。为此,我们提出了一个稀疏的状态MARL(S2RL)框架,该框架利用稀疏的注意机制将无关的信息丢弃在局部观察中。通过自我注意力和稀疏注意机制估算局部效用函数,然后将其合并为标准的关节价值函数和中央评论家的辅助关节价值函数。我们将S2RL框架设计为即插即用的模块,使其足够一般,可以应用于各种方法。关于Starcraft II的广泛实验表明,S2RL可以显着提高许多最新方法的性能。
translated by 谷歌翻译
最近,自主驾驶社会上有许多进展,吸引了学术界和工业的很多关注。然而,现有的作品主要专注于汽车,自动驾驶卡车算法和模型仍然需要额外的开发。在本文中,我们介绍了智能自动驾驶卡车系统。我们所呈现的系统由三个主要组成部分组成,1)一个现实的交通仿真模块,用于在测试场景中产生现实的交通流量,2)设计和评估了在现实世界部署中模仿实际卡车响应的高保真卡车模型,3 )具有基于学习的决策算法和多模轨迹策划仪的智能计划模块,考虑到卡车的约束,道路斜率变化和周围的交通流量。我们为每个组分单独提供定量评估,以证明每个部件的保真度和性能。我们还将我们的建议系统部署在真正的卡车上,并进行真实的世界实验,表明我们的系统能力缓解了SIM-TO-REAL差距。我们的代码可以在https://github.com/inceptioresearch/iits提供
translated by 谷歌翻译
我们总结了使用巨大的自动语音识别(ASR)模型的大量努力的结果,该模型使用包含大约一百万小时音频的大型,多样的未标记数据集进行了预训练。我们发现,即使对于拥有数万个小时的标记数据的非常大的任务,预训练,自我培训和扩大模型大小的组合也大大提高了数据效率。特别是,在具有34K小时标记数据的ASR任务上,通过微调80亿个参数预先训练的构象异构体模型,我们可以匹配最先进的(SOTA)性能(SOTA)的性能,只有3%的培训数据和通过完整的训练集可以显着改善SOTA。我们还报告了从使用大型预训练和自我训练的模型来完成一系列下游任务所获得的普遍利益,这些任务涵盖了广泛的语音域,并涵盖了多个数据集大小的大小,包括在许多人中获得SOTA性能公共基准。此外,我们利用预先训练的网络的学会表示,在非ASR任务上实现SOTA结果。
translated by 谷歌翻译
深度神经网络在许多以数据驱动和预测为导向的应用中表现出了出色的性能,有时甚至比人类表现更好。但是,他们最重要的缺点是缺乏解释性,这使得它们在许多现实世界中的吸引力降低了。当与犯罪判断,财务分析和医学诊断等不确定的道德问题或环境因素有关时,必须挖掘模型预测(解释模型知识)的证据,以说服人类。因此,研究如何解释模型知识对于学术研究和实际应用都至关重要。
translated by 谷歌翻译
Recent progress in geometric computer vision has shown significant advances in reconstruction and novel view rendering from multiple views by capturing the scene as a neural radiance field. Such approaches have changed the paradigm of reconstruction but need a plethora of views and do not make use of object shape priors. On the other hand, deep learning has shown how to use priors in order to infer shape from single images. Such approaches, though, require that the object is reconstructed in a canonical pose or assume that object pose is known during training. In this paper, we address the problem of how to compute equivariant priors for reconstruction from a few images, given the relative poses of the cameras. Our proposed reconstruction is $SE(3)$-gauge equivariant, meaning that it is equivariant to the choice of world frame. To achieve this, we make two novel contributions to light field processing: we define light field convolution and we show how it can be approximated by intra-view $SE(2)$ convolutions because the original light field convolution is computationally and memory-wise intractable; we design a map from the light field to $\mathbb{R}^3$ that is equivariant to the transformation of the world frame and to the rotation of the views. We demonstrate equivariance by obtaining robust results in roto-translated datasets without performing transformation augmentation.
translated by 谷歌翻译